51 research outputs found
Fewest repetitions in infinite binary words
A square is the concatenation of a nonempty word with itself. A word has
period p if its letters at distance p match. The exponent of a nonempty word is
the quotient of its length over its smallest period.
In this article we give a proof of the fact that there exists an infinite
binary word which contains finitely many squares and simultaneously avoids
words of exponent larger than 7/3. Our infinite word contains 12 squares, which
is the smallest possible number of squares to get the property, and 2 factors
of exponent 7/3. These are the only factors of exponent larger than 2. The
value 7/3 introduces what we call the finite-repetition threshold of the binary
alphabet. We conjecture it is 7/4 for the ternary alphabet, like its repetitive
threshold
Avoiding conjugacy classes on the 5-letter alphabet
We construct an infinite word w over the 5-letter alphabet such that for every factor f of w of length at least two, there exists a cyclic permutation of f that is not a factor of w. In other words, w does not contain a non-trivial conjugacy class. This proves the conjecture in Gamard et al. [TCS 2018
Characterization of some binary words with few squares
Thue proved that the factors occurring infinitely many times in square-free words over {0,1,2} avoiding the factors in {010,212} are the factors of the fixed point of the morphism 0 → 012, 1 → 02, 2 → 1. He similarly characterized square-free words avoiding {010,020} and {121,212} as the factors of two morphic words. In this paper, we exhibit smaller morphisms to define these two square-free morphic words and we give such characterizations for six types of binary words containing few distinct squares
Infinite binary words containing repetitions of odd period
A square is the concatenation of a nonempty word with itself. A word has period p if its letters at distance p match. The exponent of a nonempty word is its length divided by its smallest period. In this article, we give some new results on the trade-off between the number of squares and the number of cubes in infinite binary words whose square factors have odd periods
Finite-Repetition threshold for infinite ternary words
The exponent of a word is the ratio of its length over its smallest period.
The repetitive threshold r(a) of an a-letter alphabet is the smallest rational
number for which there exists an infinite word whose finite factors have
exponent at most r(a). This notion was introduced in 1972 by Dejean who gave
the exact values of r(a) for every alphabet size a as it has been eventually
proved in 2009.
The finite-repetition threshold for an a-letter alphabet refines the above
notion. It is the smallest rational number FRt(a) for which there exists an
infinite word whose finite factors have exponent at most FRt(a) and that
contains a finite number of factors with exponent r(a). It is known from
Shallit (2008) that FRt(2)=7/3.
With each finite-repetition threshold is associated the smallest number of
r(a)-exponent factors that can be found in the corresponding infinite word. It
has been proved by Badkobeh and Crochemore (2010) that this number is 12 for
infinite binary words whose maximal exponent is 7/3.
We show that FRt(3)=r(3)=7/4 and that the bound is achieved with an infinite
word containing only two 7/4-exponent words, the smallest number.
Based on deep experiments we conjecture that FRt(4)=r(4)=7/5. The question
remains open for alphabets with more than four letters.
Keywords: combinatorics on words, repetition, repeat, word powers, word
exponent, repetition threshold, pattern avoidability, word morphisms.Comment: In Proceedings WORDS 2011, arXiv:1108.341
Binary Jumbled String Matching for Highly Run-Length Compressible Texts
The Binary Jumbled String Matching problem is defined as: Given a string
over of length and a query , with non-negative
integers, decide whether has a substring with exactly 's and
's. Previous solutions created an index of size O(n) in a pre-processing
step, which was then used to answer queries in constant time. The fastest
algorithms for construction of this index have running time
[Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or in
the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index
constructed directly from the run-length encoding of . The construction time
of our index is , where O(n) is the time for computing
the run-length encoding of and is the length of this encoding---this
is no worse than previous solutions if and better if . Our index can be queried in time. While
in the worst case, preliminary investigations have
indicated that may often be close to . Furthermore, the algorithm
for constructing the index is conceptually simple and easy to implement. In an
attempt to shed light on the structure and size of our index, we characterize
it in terms of the prefix normal forms of introduced in [Fici and Lipt\'ak,
DLT 2011].Comment: v2: only small cosmetic changes; v3: new title, weakened conjectures
on size of Corner Index (we no longer conjecture it to be always linear in
size of RLE); removed experimental part on random strings (these are valid
but limited in their predictive power w.r.t. general strings); v3 published
in IP
Left Lyndon tree construction
We extend the left-to-right Lyndon factorisation of a word to the left Lyn-don tree construction of a Lyndon word. It yields an algorithm to sort the prefixes of a Lyndon word according to the infinite ordering defined by Dolce et al. (2019). A straightforward variant computes the left Lyndon forest of a word. All algorithms run in linear time on a general alphabet (letter-comparison model)
Corpus-building and corpus-based musicology for the Early Modern Period: Towards a complete Electronic Corpus of Lute Music... and beyond
Sustainable musicology must make use of as wide a set of contributors, and hear as many voices, as possible. As the online Electronic Corpus of Lute Music approaches its 20th year, a change of approach – embracing enthusiast and scholarly collections alike – is increasing the size of the encoded corpus tenfold and could allow us to provide metadata on almost all of the over 60,000 items in the known lute repertory. This approach brings challenges and limitations, as well as opportunities for scholarship beyond what has previously been possible.
The new sub-corpora have diverse editorial strategies and metadata quality, sometimes lacking basic information such as instrumental tuning. On the other hand, a combination of resources to give even 15-20% of the known repertory, combined with metadata to evaluate biases in that sample, could prove invaluable for corpus studies, and also help discover hitherto unrecognised connections and quotations between works.
As many vocal pieces of the period are now available online in digital facsimile, the lute corpus also presents a tantalising key for exploring the wider repertory of the period. Through Optical Music Recognition, we are gathering an expanding corpus of >500,000 pages transcribed from early-modern sources. Again, the nature of the material and how it has been gathered places limitations on the uses that can be made of it. Nonetheless, appropriate pattern discovery methods can support search and certain kinds of analysis.
Large-scale, cross-corpus analysis between vocal and instrumental works presents a particularly exciting opportunity, but requires adaptations to existing approaches. Lute tablature makes no distinction between the voices of a composition, making many conventional melodic features unavailable without further processing.
Building and using these corpora requires new approaches to computational musicology – not just algorithmic approaches, but also social and organisational – to ensure a strong future for corpus-based research
- …